Skip to main content
TrustRadius
Amazon EMR

Amazon EMR

Overview

What is Amazon EMR?

Amazon EMR is a cloud-native big data platform for processing vast amounts of data quickly, at scale. Using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi (Incubating), and Presto, coupled with the scalability…

Read more
Recent Reviews

Amazon EMR Review

7 out of 10
September 22, 2020
Incentivized
Amazon EMR is being used by our organization to simplify running big data frameworks, and provide the Amazon EMR highlights, product …
Continue reading
Read all reviews

Awards

Products that are considered exceptional by their customers based on a variety of criteria win TrustRadius awards. Learn more about the types of TrustRadius awards to make the best purchase decision. More about TrustRadius Awards

Return to navigation

Product Details

What is Amazon EMR?

Amazon EMR Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Frequently Asked Questions

Amazon EMR is a cloud-native big data platform for processing vast amounts of data quickly, at scale. Using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi (Incubating), and Presto, coupled with the scalability of Amazon EC2 and scalable storage of Amazon S3, EMR gives analytical teams the engines and elasticity to run Petabyte-scale analysis.

Reviewers rate Support Rating highest, with a score of 9.

The most common users of Amazon EMR are from Enterprises (1,001+ employees).
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews and Ratings

(60)

Attribute Ratings

Reviews

(1-3 of 3)
Companies can't remove reviews or game the system. Here's why
José David Rodríguez Gómez | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
On request transitory clusters for huge information handling. I like its accessibility completely different taken a toll tire makes it greatly flexible for distinctive scale clients. Can be pre-installed with any Huge information apparatuses like Hive, Start, Pig, etc. Nitty-gritty cluster observing makes a difference to track a few measurements, in turn, makes a difference to diminish fetched.
  • Big data processing.
  • The resizing feature is good.
  • Ease of use and creating new clusters.
  • The user interface could use a facelift.
  • Overhead delay in starting clusters.
  • Big learning curve for someone who hasn't used a program like this before.
We are running it to perform preparation which takes a few hours on EC2 to be running on a spark-based EMR cluster to total the preparation inside minutes rather than a few hours. Ease of utilization and capacity to select from either Hadoop or spark. Processing time diminishes from 5-8 hours to 25-30 minutes compared with the Ec2 occurrence and more in a few cases.
  • EMR can execute the code utilizing start or other clusters like Hadoop.
  • Execution time comes down to a few minutes as against a few hours running on either EC2 or other computing servers.
  • Easy to select between hadoop or start based EMR clusters.
  • Reduced times of processing.
  • He platform is very useful in regards to its processing and storage of big data.
  • No need to handle complex configuration of Big data platform.
Thomas Young | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
Amazon Elastic MapReduce is used by my department to produce big data analytics for certain clients. The software address data mining and predictive analytics for data sets that take a long time to process. The software is not used for econometric or other analytical evaluation because the size of the data sets does not lend themselves to such analysis. The software is used almost exclusively for data mining and simple reporting for large data cases.
  • Amazon Elastic MapReduce works well for managing analyses that use multiple tools, such as Hadoop and Spark. If it were not for the fact that we use multiple tools, there would be less need for MapReduce.
  • MapReduce is always on. I've never had a problem getting data analyses to run on the system. It's simple to set up data mining projects.
  • Amazon Elastic MapReduce has no problems dealing with very large data sets. It processes them just fine. With that said, the outputs don't come instantaneously. It takes time.
  • The analytical processes generally run quicker with the standalone tools of Hadoop, Spark, and others. If you only use one big data tool and don't really need things simplified, then Elastic MapReduce is more of an overhead tool that doesn't add much value.
  • The analytical capabilities of Elastic MapReduce are nowhere near as complex or broad as non-big data tools. I would suggest not using the tool unless your data really is big data.
  • The machine learning capabilities of Elastic MapReduce (using the big data tools of Hadoop/Spark) are good but are not as easy to use as other machine learning tools.
Amazon Elastic MapReduce is useful in cases where two conditions are met. First, that you are planning on using multiple big data tools simultaneously to analyze big data sets. And second, that you need a tool that simplifies managing big data tools. If these two conditions are met, MapReduce does a great job. The user interface is simple. The program eliminates some programming requirements. The software also makes setting up big data analyses much easier. With these benefits acknowledged, MapReduce is not a good tool for "small" data analyses, given that there are other tools that do the job quicker and much more professional output. If you're on the fence, try out MapReduce with competing "small" data tools and see if you really need big data software.
  • Amazon Elastic MapReduce has had a positive ROI in the sense that it saved time managing big data projects where analysts were using different big data tools. Essentially, an increase in employee productivity.
  • Elastic MapReduce is not worth it in cases where you're just trying things out. You'll likely lose money unless you're sure that using MapReduce is a good idea.
  • Elastic MapReduce takes some time learning, although not too much. If the employee is less well-versed in big data analytics, the software is a high hill to climb that eats up employee time.
Perhaps the biggest advantage Amazon Elastic MapReduce has over competing big data management software is the user base. Elastic MapReduce, compliments of its connection with Amazon, has a large user base to whom questions about functionality can be addressed. The software also has a very nice user interface. Additionally, Elastic MapReduce runs fairly quickly and the results are generally easier to manipulate. With this said, Elastic MapReduce is definitely not the easiest nor quickest tool for big data analytics.
Google Ad Manager, Google Analytics Premium, Google Cloud AI, Google Cloud Dataflow, Google Cloud Datastore, Google Cloud Storage, 6sense, MapR, IBM InfoSphere DataStage, IBM InfoSphere Data Replication, Domino, Datameer, Cloudera Data Science Workbench, Cloudera Manager, Google Cloud SQL
November 17, 2017

EMR review

Score 8 out of 10
Vetted Review
Verified User
Incentivized
EMR is being used by our department, not the whole organization. We use it as the infrastructure on which we run Spark jobs. Those jobs are mainly used for data I/O, data processing, and machine learning applications.
  • Ease of use and ease to setup
  • Autoscaling functionality
  • Integrated into the AWS environment
  • Cost overhead is a bit high
  • Limited versions of frameworks that can be used
Well suited if you quickly want to setup a distributed compute platform, such as Spark. But you have to be advanced enough that you really want to separate compute from data storage. For example, for certain applications packaged solution such as MPP databases (e.g. Redshift) is much easier to set up that Spark on EMR and S3 with the appropriate file formats.
  • It was easy to set up initial versions of Spark on this
  • Still used as our compute platform as its easy to manage
  • Certain times we forgot to shut down clusters and were overcharged
The alternatives to EMR are mainly hadoop distributions owned by the 3 companies above. I have not used the other distributions so it is difficult to comment, but the general tradeoff is, at the cost of a longer setup time and more infra management, you get more flexible versioning and potentially faster access to newer versions of some frameworks such as Spark.
Return to navigation